Credit Card Fraud Detection

Anomaly detection is a classification process in which rare items, events, or observations in data sets are identified. Learn more about this here. In this article, we investigate Credit Card Fraud Detection dataset from Kaggle.com.

Credit Card Fraud Detection

Context

Credit card companies must be able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase.

Content

The datasets contain transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions. It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, … V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-sensitive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.

Initial Analysis

From the above dataset, we can visualize the following features.

Time vs Amount of Transactions

Data Distribution

Transaction Class Distribution

Let's look that transaction class distribution

The Dataset is quite large, we would like to use pandas DataFrame sample feature with using a one-tenth of the data as a sample.

X and y sets

First off, let's define $X$ and $y$ sets.

Features with high variance

Moreover, high variance for some features can hurt our modeling process. For this reason, we would like to standardize features by removing the mean and scaling to unit variance. In this article, we demonstrated the benefits of scaling data using StandardScaler().

Data Correlations

Saving to a CSV

Modifying dataset.

Train and Test Sets

StratifiedKFold is a variation of k-fold which returns stratified folds: each set contains approximately the same percentage of samples of each target class as the complete set.

Modeling: scikit-learn Multi-layer Perceptron Classifier (MLP)

This model optimizes the log-loss function using LBFGS or stochastic gradient descent. See sklearn.neural_network.MLPClassifier.

Some of the metrics that we use here to mesure the accuracy: \begin{align} \text{Confusion Matrix} = \begin{bmatrix}T_p & F_p\\ F_n & T_n\end{bmatrix}. \end{align}

where $T_p$, $T_n$, $F_p$, and $F_n$ represent true positive, true negative, false positive, and false negative, respectively.

\begin{align} \text{Precision} &= \frac{T_{p}}{T_{p} + F_{p}},\\ \text{Recall} &= \frac{T_{p}}{T_{p} + F_{n}},\\ \text{F1} &= \frac{2 \times \text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}\\ \text{Balanced-Accuracy (bACC)} &= \frac{1}{2}\left( \frac{T_{p}}{T_{p} + F_{n}} + \frac{T_{n}}{T_{n} + F_{p}}\right ) \end{align}

The accuracy can be a misleading metric for imbalanced data sets. In these cases, a balanced accuracy (bACC) [6] is recommended that normalizes true positive and true negative predictions by the number of positive and negative samples, respectively, and divides their sum by two.

MLP with Default Parameters

MLP with the Best Parameters

Since we have identified the best parameters for our modeling, we train another model using these parameters.

In the next article, we try to improve these results using PyTorch MLP.


References

  1. Kaggle Dataset: Credit Card Fraud Detection
  2. scikit-learn: classifiers
  3. scikit-learn: Metrics and scoring: quantifying the quality of predictions
  4. Confusion matrix
  5. Multi-layer Perceptron classifier
  6. Mower, Jeffrey P. "PREP-Mt: predictive RNA editor for plant mitochondrial genes." BMC bioinformatics 6.1 (2005): 1-15.